Research and Development Workflow on Bracewell
Connecting to Bracewell
CSIRO VPN
Request VPN access. Easily done on-site but can also be done off-site (https://security.csiro.au/offsite.php)
You will most likely be redirected to https://vpn.csiro.au/. When you log in, it will attempt to install the "Cisco AnyConnect Secure Mobility Client", and fail. While you can install this manually, I find the best thing is to install the open-source alternative, openconnect:
$ pacin openconnect networkmanager-openconnect
You can now initiate VPN connections in the command line console with openconnect:
$ sudo openconnect vpn.csiro.au
The Network Manager package makes it really nice and easy to work with
$ pacin networkmanager-openconnect
(you may need to logout and log back in)
SSH
$ ssh -X ident@bracewell.hpc.csiro.au
You can also use the -Y flag instead.
Might be a good idea to set up password-less SSH with public key cryptography.
Interactive node
Use an interactive node for persistence, moderate computing and a well integrated experience. The interactive node for Bracewell is:
bracewell-i1.hpc.csiro.au
Module environment
$ module avail
$ module list Currently Loaded Modulefiles: 1) SC 3) cuda-driver/current 5) intel-fc/16.0.4.258 2) slurm/16.05.8 4) intel-cc/16.0.4.258
$ module show tensorflow/1.3.0-py35-gpu ------------------------------------------------------------------- /var/apps_bracewell/modules/modulefiles/tensorflow/1.3.0-py35-gpu: prepend-path PATH /apps/tensorflow/1.3.0-py35-gpu/bin prepend-path CPATH /apps/tensorflow/1.3.0-py35-gpu/include prepend-path PKG_CONFIG_PATH /apps/tensorflow/1.3.0-py35-gpu/lib/pkgconfig prepend-path LD_RUN_PATH /apps/tensorflow/1.3.0-py35-gpu/lib prepend-path PYTHONPATH /apps/tensorflow/1.3.0-py35-gpu/lib/python3.5/site-packages setenv TENSORFLOW_HOME /apps/tensorflow/1.3.0-py35-gpu load-module python/3.6.1 load-module cuda/8.0.61 load-module cudnn/v6 -------------------------------------------------------------------
$ module load tensorflow/1.3.0-py35-gpu
$ module list Currently Loaded Modulefiles: 1) SC 5) intel-fc/16.0.4.258 9) cudnn/v6 2) slurm/16.05.8 6) intel-mkl/2017.2.174 10) tensorflow/1.3.0-py35-gpu 3) cuda-driver/current 7) python/3.6.1 4) intel-cc/16.0.4.258 8) cuda/8.0.61
Customizing your shell
$ cd ~ $ git clone https://github.com/ltiao/dotfiles.git $ source bootstrap.sh $ rm ~/.hushlogin # I don't want to disable the Bracewell ASCII art login screen
While you're here, you might like to set-up your virtualenvwrapper. Add something like the following to your bash_profile:
module load python/3.6.1 export WORKON_HOME=$HOME/.virtualenvs source $(which virtualenvwrapper_lazy.sh)
Now you can create Python virtualenv s as usual with
$ mkvirtualenv --system-site-packages <virtual-env-name>
Note
Some work still needs to be done to determine how system-wide installed packages are affected when they are upgraded in a virtualenv. For example, I noticed Keras==2.0.3 was install in the system site packages. When I execute pip install Keras==2.0.8 does the virtualenv installed version then take precedence and override the system-wide version? I assume this is the case.
Some dependencies for zsh are missing so you're stuck with Bash.
Note
In batch jobs that use bash (i.e. sinteractive) or if the script you run with sbatch begins with #!/bin/bash, your ~/.bash_profile will be invoked.
Slurm batch system
Time limits:
A time limit of zero requests that no time limit be imposed. Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".
Interactive batch shell
Batch job scripts
$ sinteractive -h Usage: sinteractive [-n] [-t] [-p] [-J] [-w] [-g] Optional arguments: -n: Number of tasks to request (default: 1) Consider the number of processes you need to run. -c: Number of CPUs per task to request (default: 1) Consider the number of threads each process requires. A combination of number of tasks and CPUs per task are typically required for hybrid codes (multi-processes and multi-threads). Note that MATLAB's parallel processing requires this to be set. e.g. -n 1 -c 4 will allow MATLAB's "local" cluster profile to use 4 workers. -t: Wall time to request (default: 2:00:00) -m: Memory to request (no default) -p: Partition to run job in (no default) -J: Job name (default: interactive) -w: Node name (no default) -g: Request a generic resource e.g. gpu:2 NB: The command that is actually run is printed first so you can copy it and run fancier versions with more salloc and srun options if necessary. e.g. sinteractive -n 2 -t 1:00:00 -m 2gb
tia00c at bracewell-i1 in ~ $ sinteractive -n 1 -c 1 -m 50mb -g gpu:1 -t 00:00:30 running: salloc --ntasks-per-node 1 --cpus-per-task=1 --mem 50mb -J interactive -t 00:00:30 --gres gpu:1 srun --pty /bin/bash -l salloc: Granted job allocation 8186661 srun: Job step created tia00c at b043 in ~ $
Storage and persistence
Bracewell landing page
Transferring and synchonizing files
Nautilus